Efficient Exploitation of Hyper Loop Parallelism in Vectorization

نویسندگان

  • Shixiong Xu
  • David Gregg
چکیده

Modern processors can provide large amounts of processing power with vector SIMD units if the compiler or programmer can vectorize their code. With the advance of SIMD support in commodity processors, more and more advanced features are introduced, such as flexible SIMD lane-wise operations (e.g. blend instructions). However, existing vectorizing techniques fail to apply global SIMD lane-wise optimization due to the unawareness of the computation structure of the vectorizable loop. In this paper, we put forward an approach to automatic vectorization based on hyper loop parallelism, which is exposed by hyper loops. Hyper loops recover the loop structures of the vectorizable loop and help vectorization to apply global SIMD lane-wise optimization. We implemented our vectorizing technique in the Cetus source-to-source compiler to generate C code with SIMD intrinsics. The preliminary experimental results show that our vectorizing technique can achieve significant speedups up over the non-vectorized code in our test cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems

Systems based on the Pentium III and Pentium 4 processors enable the exploitation of parallelism at a fineand medium-grained level. Dualand quad-processor systems, for example, enable the exploitation of mediumgrained parallelism by using multithreaded code that takes advantage of multiple control and arithmetic logic units. Streaming Single-Instruction-Multiple-Data (SIMD) extensions, on the o...

متن کامل

Exploiting Outer Loops Vectorization in High Level Synthesis

Synthesis of DoAll loops is a key aspect of High Level Synthesis since they allow to easily exploit the potential parallelism provided by programmable devices. This type of parallelism can be implemented in several ways: by duplicating the implementation of body loop, by exploiting loop pipelining or by applying vectorization. In this paper a methodology for the synthesis of complex DoAll loops...

متن کامل

Portable Support and Exploitation of Nested Parallelism in OpenMP

In this paper, we present an alternative implementation of the NANOS OpenMP runtime library (NthLib) that targets portability and efficient support of multiple levels of parallelism. We have implemented the runtime libraries of available opensource OpenMP compilers on top of NthLib, reducing thus their overheads and providing them with inherent support for nested parallelism. In addition, we pr...

متن کامل

Vector Microprocessors for Desktop Computing

Desktop workloads are expected to shift over the next few years to become increasingly media-centric. These multimedia applications require much larger computational demands than current desktop processors can provide. In this paper, we describe four major requirements that we believe any eeective desktop processor should address: it should meet the performance requirements of desktop workloads...

متن کامل

Vectorization of Multigrid Codes Using SIMD ISA Extensions

Motivated by the recent trend towards small-scale SIMD processing, we have addressed in this paper the vectorization of multigrid codes on modern microprocessors. The aim is to demonstrate that this relatively new feature can be beneficial not only for multimedia programs but also for such numerical codes. As target kernels we have considered both standard and robust multigrid algorithms, which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014